Diploma Thesis Analysis and Comparison of Existent Information Extraction Methods

نویسنده

  • Jun Ying
چکیده

Information extraction is initially applied for identification of desired information from natural language documents and conversion of the extracted text into a self-defined presentation. With the rapidly increasing amount of available information sources and electronic documents on the World Wide Web, information extraction is extended for identification from structured and semi-structured web pages. In the past years, a lot of solutions are described and various information extraction systems are implemented. In this thesis, we present a theoretical analysis and comparison of several information extraction systems in two sub-areas: on the one hand, the systems distinguish from each other in the features used for identification. Thereby, we compare several aspects of different information extraction systems, such as pre-processing for generation of features, various constraints for characterization of target information, and different representations of extraction patterns, namely, how constraints are utilized. On the other hand, in order to reduce human efforts and improve the portability of information extraction systems, diverse machine learning techniques are applied for building information extraction systems. In this thesis, we represent various types of training data and introduce the active learning technique, the boosting algorithm and different rule learning algorithms used in information extraction systems. Most of the information extraction systems mentioned in this thesis employ rule learning techniques. Thereby, the structure, the evaluation heuristics and the pruning methods are compared. In addition, bayesian learning applied in information extraction system is presented as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Visualization Techniques for Data Mining

Declaration This thesis contains no material that has been accepted for the award of any other degree or diploma in any other university. To the best of my knowledge and belief, the thesis contains no material previously published or written by any other person, except where due reference is made in the text of the thesis. i Acknowledgements I would like to thank the following people for their ...

متن کامل

Evolving Code Clones

The goal of this project plan is to give an short overview of the topic that lies beneath the diploma thesis. It describes the main tasks and outcomes that should be achieved during this thesis. Since time management is an essential part of writing a diploma thesis, this plan also serves a guideline helping not to loose control over the allocated time. The thesis consists of 5 tasks (See [1] fo...

متن کامل

Developing a New Method in Object Based Classification to Updating Large Scale Maps with Emphasis on Building Feature

According to the cities expansion, updating urban maps for urban planning is important and its effectiveness is depend on the information extraction / change detection accuracy. Information extraction methods are divided into two groups, including Pixel-Based (PB) and Object-Based (OB). OB analysis has overcome the limitations of PB analysis (producing salt-pepper results and features with hole...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006